9 research outputs found

    Benchmarking the Utility of -event Differential Privacy Mechanisms – When Baselines Become Mighty Competitors

    Get PDF
    The -event framework is the current standard for ensuring differential privacy on continuously monitored data streams. Following the proposition of-event differential privacy, various mechanisms to implement the framework were proposed. Their comparability in empirical studies is vital for both practitioners to choose a suitable mechanism and researchers to identify current limitations and propose novel mechanisms. By conducting a literature survey, we observe that the results of existing studies are hardly comparable and partially intrinsically inconsistent. To this end, we formalize an empirical study of -event mechanisms by a four-tuple containing re-occurring elements found in our survey. We introduce requirements on these elements that ensure the comparability of experimental results. Moreover, we propose a benchmark that meets all requirements and establishes a new way to evaluate existing and newly proposed mechanisms. Conducting a large-scale empirical study, we gain valuable new insights into the strengths and weaknesses of existing mechanisms. An unexpected – yet explainable – result is a baseline supremacy, i.e., using one of the two baseline mechanisms is expected to deliver good or even the best utility. Finally, we provide guidelines for practitioners to select suitable mechanisms and improvement options for researchers to break the baseline supremacy

    Trading Indistinguishability-based Privacy and Utility of Complex Data

    Get PDF
    The collection and processing of complex data, like structured data or infinite streams, facilitates novel applications. At the same time, it raises privacy requirements by the data owners. Consequently, data administrators use privacy-enhancing technologies (PETs) to sanitize the data, that are frequently based on indistinguishability-based privacy definitions. Upon engineering PETs, a well-known challenge is the privacy-utility trade-off. Although literature is aware of a couple of trade-offs, there are still combinations of involved entities, privacy definition, type of data and application, in which we miss valuable trade-offs. In this thesis, for two important groups of applications processing complex data, we study (a) which indistinguishability-based privacy and utility requirements are relevant, (b) whether existing PETs solve the trade-off sufficiently, and (c) propose novel PETs extending the state-of-the-art substantially in terms of methodology, as well as achieved privacy or utility. Overall, we provide four contributions divided into two parts. In the first part, we study applications that analyze structured data with distance-based mining algorithms. We reveal that an essential utility requirement is the preservation of the pair-wise distances of the data items. Consequently, we propose distance-preserving encryption (DPE), together with a general procedure to engineer respective PETs by leveraging existing encryption schemes. As proof of concept, we apply it to SQL log mining, useful for database performance tuning. In the second part, we study applications that monitor query results over infinite streams. To this end, -event differential privacy is state-of-the-art. Here, PETs use mechanisms that typically add noise to query results. First, we study state-of-the-art mechanisms with respect to the utility they provide. Conducting the so far largest benchmark that fulfills requirements derived from limitations of prior experimental studies, we contribute new insights into the strengths and weaknesses of existing mechanisms. One of the most unexpected, yet explainable result, is a baseline supremacy. It states that one of the two baseline mechanisms delivers high or even the best utility. A natural follow-up question is whether baseline mechanisms already provide reasonable utility. So, second, we perform a case study from the area of electricity grid monitoring revealing two results. First, achieving reasonable utility is only possible under weak privacy requirements. Second, the utility measured with application-specific utility metrics decreases faster than the sanitization error, that is used as utility metric in most studies, suggests. As a third contribution, we propose a novel differential privacy-based privacy definition called Swellfish privacy. It allows tuning utility beyond incremental -event mechanism design by supporting time-dependent privacy requirements. Formally, as well as by experiments, we prove that it increases utility significantly. In total, our thesis contributes substantially to the research field, and reveals directions for future research

    Benchmarking the Utility of w-Event Differential Privacy Mechanisms: When Baselines Become Mighty Competitors

    Get PDF
    The w-event framework is the current standard for ensuring differential privacy on continuously monitored data streams. Following the proposition of w-event differential privacy, various mechanisms to implement the framework are proposed. Their comparability in empirical studies is vital for both practitioners to choose a suitable mechanism, and researchers to identify current limitations and propose novel mechanisms. By conducting a literature survey, we observe that the results of existing studies are hardly comparable and partially intrinsically inconsistent. To this end, we formalize an empirical study of w-event mechanisms by re-occurring elements found in our survey. We introduce requirements on these elements that ensure the comparability of experimental results. Moreover, we propose a benchmark that meets all requirements and establishes a new way to evaluate existing and newly proposed mechanisms. Conducting a large-scale empirical study, we gain valuable new insights into the strengths and weaknesses of existing mechanisms. An unexpected - yet explainable - result is a baseline supremacy, i.e., using one of the two baseline mechanisms is expected to deliver good or even the best utility. Finally, we provide guidelines for practitioners to select suitable mechanisms and improvement options for researchers

    Swellfish Privacy: Exploiting Time-Dependent Relevance for Continuous Differential Privacy : Technical Report

    Get PDF
    Today, continuous publishing of differentially private query results is the de-facto standard. The challenge hereby is adding enough noise to satisfy a given privacy level, and adding as little noise as necessary to keep high data utility. In this context, we observe that privacy goals of individuals vary significantly over time. For instance, one might aim to hide whether one is on vacation only during school holidays. This observation, named time-dependent relevance, implies two effects which – properly exploited – allow to tune data utility. The effects are time-variant sensitivity (TEAS) and time-variant number of affected query results (TINAR). As today’s DP frameworks, by design, cannot exploit these effects, we propose Swellfish privacy. There, with policy collections, individuals can specify combinations of time-dependent privacy goals. Then, query results are Swellfish-private, if the streams are indistinguishable with respect to such a collection.We propose two tools for designing Swellfish-private mechanisms, namely, temporal sensitivity and a composition theorem, each allowing to exploit one of the effects. In a realistic case study, we show empirically that exploiting both effects improves data utility by one to three orders of magnitude compared to state-of-the-art w-event DP mechanisms. Finally, we generalize the case study by showing how to estimate the strength of the effects for arbitrary use cases

    Studying Utility Metrics for Differentially Private Low-Voltage Grid Monitoring

    Get PDF

    Distance-based data mining over encrypted data

    Get PDF

    Towards multi-purpose main-memory storage structures: Exploiting sub-space distance equalities in totally ordered data sets for exact knn queries

    Get PDF
    Efficient knn computation for high-dimensional data is an important, yet challenging task. Today, most information systems use a column-store back-end for relational data. For such systems, multi-dimensional indexes accelerating selections are known. However, they cannot be used to accelerate knn queries. Consequently, one relies on sequential scans, specialized knn indexes, or trades result quality for speed. To avoid storing one specialized index per query type, we envision multipurpose indexes allowing to efficiently compute multiple query types. In this paper, we focus on additionally supporting knn queries as first step towards this goal. To this end, we study how to exploit total orders for accelerating knn queries based on the sub-space distance equalities observation. It means that non-equal points in the full space, which are projected to the same point in a sub space, have the same distance to every other point in this sub space. In case one can easily find these equalities and tune storage structures towards them, this offers two effects one can exploit to accelerate knn queries. The first effect allows pruning of point groups based on a cascade of lower bounds. The second allows to re-use previously computed sub-space distances between point groups. This results in a worst-case execution bound, which is independent of the distance function. We present knn algorithms exploiting both effects and show how to tune a storage structure already known to work well for multi-dimensional selections. Our investigations reveal that the effects are robust to increasing, e.g., the dimensionality, suggesting generally good knn performance. Comparing our knn algorithms to well-known competitors reveals large performance improvements up to one order of magnitude. Furthermore, the algorithms deliver at least comparable performance as the next fastest competitor suggesting that the algorithms are only marginally affected by the curse of dimensionality

    Experimental Validation and Deployment of Observability Applications for Monitoring of Low-voltage Distribution Grids

    Get PDF
    Future distribution grids will be subjected to fluctuations in voltages and power flows due to the presence of renewable sources with intermittent power generation. The advanced smart metering infrastructure (AMI) enables the distribution system operators (DSOs) to measure and analyze electrical quantities such as voltages, currents and power at each customer connection point. Various smart grid applications can make use of the AMI data either in offline or close to real-time mode to assess the grid voltage conditions and estimate losses in the lines/cables. The outputs of these applications can enable DSOs to take corrective action and make a proper plan for grid upgrades. In this paper, the process of development and deployment of applications for improving the observability of distributions grids is described, which consists of the novel deployment framework that encompasses the proposition of data collection, communication to the servers, data storage, and data visualization. This paper discussed the development of two observability applications for grid monitoring and loss calculation, their validation in a laboratory setup, and their field deployment. A representative distribution grid in Denmark is chosen for the study using an OPAL-RT real-time simulator. The results of the experimental studies show that the proposed applications have high accuracy in estimating grid voltage magnitudes and active energy losses. Further, the field deployment of the applications prove that DSOs can gain insightful information about their grids and use them for planning purposes
    corecore